Overview

Dataset statistics

Number of variables36
Number of observations835
Missing cells3485
Missing cells (%)11.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory235.0 KiB
Average record size in memory288.2 B

Variable types

BOOL24
NUM11
CAT1

Reproduction

Analysis started2020-04-27 20:42:39.206531
Analysis finished2020-04-27 20:43:12.090217
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
STDs (number) is highly correlated with STDsHigh Correlation
STDs is highly correlated with STDs (number) and 1 other fieldsHigh Correlation
STDs:vulvo-perineal condylomatosis is highly correlated with STDs:condylomatosisHigh Correlation
STDs:condylomatosis is highly correlated with STDs:vulvo-perineal condylomatosisHigh Correlation
STDs: Number of diagnosis is highly correlated with STDsHigh Correlation
STDs: Time since last diagnosis is highly correlated with STDs: Time since first diagnosisHigh Correlation
STDs: Time since first diagnosis is highly correlated with STDs: Time since last diagnosisHigh Correlation
Number of sexual partners has 25 (3.0%) missing values Missing
Num of pregnancies has 56 (6.7%) missing values Missing
Smokes has 13 (1.6%) missing values Missing
Smokes (years) has 13 (1.6%) missing values Missing
Smokes (packs/year) has 13 (1.6%) missing values Missing
Hormonal Contraceptives has 103 (12.3%) missing values Missing
Hormonal Contraceptives (years) has 103 (12.3%) missing values Missing
IUD has 112 (13.4%) missing values Missing
IUD (years) has 112 (13.4%) missing values Missing
STDs has 100 (12.0%) missing values Missing
STDs (number) has 100 (12.0%) missing values Missing
STDs:condylomatosis has 100 (12.0%) missing values Missing
STDs:cervical condylomatosis has 100 (12.0%) missing values Missing
STDs:vaginal condylomatosis has 100 (12.0%) missing values Missing
STDs:vulvo-perineal condylomatosis has 100 (12.0%) missing values Missing
STDs:syphilis has 100 (12.0%) missing values Missing
STDs:pelvic inflammatory disease has 100 (12.0%) missing values Missing
STDs:genital herpes has 100 (12.0%) missing values Missing
STDs:molluscum contagiosum has 100 (12.0%) missing values Missing
STDs:AIDS has 100 (12.0%) missing values Missing
STDs:HIV has 100 (12.0%) missing values Missing
STDs:Hepatitis B has 100 (12.0%) missing values Missing
STDs:HPV has 100 (12.0%) missing values Missing
STDs: Time since first diagnosis has 764 (91.5%) missing values Missing
STDs: Time since last diagnosis has 764 (91.5%) missing values Missing
Num of pregnancies has 16 (1.9%) zeros Zeros
Smokes (years) has 699 (83.7%) zeros Zeros
Smokes (packs/year) has 699 (83.7%) zeros Zeros
Hormonal Contraceptives (years) has 255 (30.5%) zeros Zeros
IUD (years) has 640 (76.6%) zeros Zeros
STDs (number) has 656 (78.6%) zeros Zeros

Variables

Age
Real number (ℝ≥0)

Distinct count44
Unique (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.023952095808383
Minimum13
Maximum84
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum13
5-th percentile16
Q121
median26
Q332
95-th percentile41
Maximum84
Range71
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.482986285
Coefficient of variation (CV)0.3139062064
Kurtosis4.866191771
Mean27.0239521
Median Absolute Deviation (MAD)6.59776973
Skewness1.403917208
Sum22565
Variance71.9610563
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[13. 14.5 16.5 30.5 37.5 45.5 51.5 84. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
23 54 6.5%
 
18 47 5.6%
 
20 45 5.4%
 
21 44 5.3%
 
19 43 5.1%
 
24 39 4.7%
 
26 38 4.6%
 
25 37 4.4%
 
30 35 4.2%
 
28 35 4.2%
 
Other values (34) 418 50.1%
 
ValueCountFrequency (%) 
13 1 0.1%
 
14 5 0.6%
 
15 16 1.9%
 
16 21 2.5%
 
17 30 3.6%
 
ValueCountFrequency (%) 
84 1 0.1%
 
79 1 0.1%
 
70 2 0.2%
 
59 1 0.1%
 
52 2 0.2%
 

Number of sexual partners
Real number (ℝ≥0)

MISSING
Distinct count12
Unique (%)1.5%
Missing25
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean2.551851851851852
Minimum1.0
Maximum28.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile5
Maximum28
Range27
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.676685994
Coefficient of variation (CV)0.6570467612
Kurtosis69.28348345
Mean2.551851852
Median Absolute Deviation (MAD)1.101975309
Skewness5.481625824
Sum2067
Variance2.811275924
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 266 31.9%
 
3 207 24.8%
 
1 193 23.1%
 
4 76 9.1%
 
5 44 5.3%
 
6 9 1.1%
 
7 7 0.8%
 
8 4 0.5%
 
9 1 0.1%
 
28 1 0.1%
 
Other values (2) 2 0.2%
 
(Missing) 25 3.0%
 
ValueCountFrequency (%) 
1 193 23.1%
 
2 266 31.9%
 
3 207 24.8%
 
4 76 9.1%
 
5 44 5.3%
 
ValueCountFrequency (%) 
28 1 0.1%
 
15 1 0.1%
 
10 1 0.1%
 
9 1 0.1%
 
8 4 0.5%
 

First sexual intercourse
Real number (ℝ≥0)

Distinct count21
Unique (%)2.5%
Missing7
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean17.020531400966185
Minimum10.0
Maximum32.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum10
5-th percentile14
Q115
median17
Q318
95-th percentile22
Maximum32
Range22
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.817000335
Coefficient of variation (CV)0.1655060156
Kurtosis4.274723378
Mean17.0205314
Median Absolute Deviation (MAD)1.973292842
Skewness1.566654707
Sum14093
Variance7.93549089
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
15 157 18.8%
 
17 149 17.8%
 
18 134 16.0%
 
16 120 14.4%
 
14 72 8.6%
 
19 58 6.9%
 
20 35 4.2%
 
13 25 3.0%
 
21 20 2.4%
 
22 9 1.1%
 
Other values (11) 49 5.9%
 
ValueCountFrequency (%) 
10 2 0.2%
 
11 2 0.2%
 
12 6 0.7%
 
13 25 3.0%
 
14 72 8.6%
 
ValueCountFrequency (%) 
32 1 0.1%
 
29 5 0.6%
 
28 3 0.4%
 
27 6 0.7%
 
26 7 0.8%
 

Num of pregnancies
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count11
Unique (%)1.4%
Missing56
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean2.3042362002567396
Minimum0.0
Maximum11.0
Zeros16
Zeros (%)1.9%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q33
95-th percentile5
Maximum11
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.455817168
Coefficient of variation (CV)0.6318003195
Kurtosis3.125795338
Mean2.3042362
Median Absolute Deviation (MAD)1.125378806
Skewness1.395841172
Sum1795
Variance2.119403625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 253 30.3%
 
2 235 28.1%
 
3 138 16.5%
 
4 74 8.9%
 
5 35 4.2%
 
6 18 2.2%
 
0 16 1.9%
 
7 6 0.7%
 
8 2 0.2%
 
10 1 0.1%
 
(Missing) 56 6.7%
 
ValueCountFrequency (%) 
0 16 1.9%
 
1 253 30.3%
 
2 235 28.1%
 
3 138 16.5%
 
4 74 8.9%
 
ValueCountFrequency (%) 
11 1 0.1%
 
10 1 0.1%
 
8 2 0.2%
 
7 6 0.7%
 
6 18 2.2%
 

Smokes
Boolean

MISSING
Distinct count2
Unique (%)0.2%
Missing13
Missing (%)1.6%
Memory size6.6 KiB
0
699
1
 
123
(Missing)
 
13
ValueCountFrequency (%) 
0 699 83.7%
 
1 123 14.7%
 
(Missing) 13 1.6%
 

Smokes (years)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count30
Unique (%)3.6%
Missing13
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean1.2538498706021899
Minimum0.0
Maximum37.0
Zeros699
Zeros (%)83.7%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile10
Maximum37
Range37
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.140727168
Coefficient of variation (CV)3.302410652
Kurtosis23.02762586
Mean1.253849871
Median Absolute Deviation (MAD)2.145565105
Skewness4.396961176
Sum1030.664594
Variance17.14562148
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 699 83.7%
 
1.266972909 15 1.8%
 
5 9 1.1%
 
9 9 1.1%
 
1 8 1.0%
 
3 7 0.8%
 
2 7 0.8%
 
7 6 0.7%
 
16 6 0.7%
 
8 6 0.7%
 
Other values (20) 50 6.0%
 
(Missing) 13 1.6%
 
ValueCountFrequency (%) 
0 699 83.7%
 
0.16 1 0.1%
 
0.5 3 0.4%
 
1 8 1.0%
 
1.266972909 15 1.8%
 
ValueCountFrequency (%) 
37 1 0.1%
 
34 1 0.1%
 
32 1 0.1%
 
28 1 0.1%
 
24 1 0.1%
 

Smokes (packs/year)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count62
Unique (%)7.5%
Missing13
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean0.46582316094720194
Minimum0.0
Maximum37.0
Zeros699
Zeros (%)83.7%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2.595
Maximum37
Range37
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.256273115
Coefficient of variation (CV)4.843625873
Kurtosis111.709599
Mean0.4658231609
Median Absolute Deviation (MAD)0.8109171984
Skewness9.181032079
Sum382.9066383
Variance5.090768368
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 699 83.7%
 
0.5132021277 18 2.2%
 
1 6 0.7%
 
3 5 0.6%
 
1.2 4 0.5%
 
0.2 4 0.5%
 
0.75 4 0.5%
 
0.05 4 0.5%
 
2 4 0.5%
 
12 3 0.4%
 
Other values (52) 71 8.5%
 
(Missing) 13 1.6%
 
ValueCountFrequency (%) 
0 699 83.7%
 
0.001 1 0.1%
 
0.003 1 0.1%
 
0.025 1 0.1%
 
0.04 2 0.2%
 
ValueCountFrequency (%) 
37 1 0.1%
 
22 1 0.1%
 
21 1 0.1%
 
19 1 0.1%
 
15 1 0.1%
 
Distinct count2
Unique (%)0.3%
Missing103
Missing (%)12.3%
Memory size6.6 KiB
1
477
0
255
(Missing)
103
ValueCountFrequency (%) 
1 477 57.1%
 
0 255 30.5%
 
(Missing) 103 12.3%
 

Hormonal Contraceptives (years)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count40
Unique (%)5.5%
Missing103
Missing (%)12.3%
Infinite0
Infinite (%)0.0%
Mean2.302915848418033
Minimum0.0
Maximum30.0
Zeros255
Zeros (%)30.5%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.5
Q33
95-th percentile10
Maximum30
Range30
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.794179557
Coefficient of variation (CV)1.647554581
Kurtosis8.82534389
Mean2.302915848
Median Absolute Deviation (MAD)2.696952885
Skewness2.595193179
Sum1685.734401
Variance14.39579851
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 255 30.5%
 
1 76 9.1%
 
2 40 4.8%
 
0.25 40 4.8%
 
3 39 4.7%
 
5 33 4.0%
 
0.5 25 3.0%
 
0.08 25 3.0%
 
6 24 2.9%
 
4 22 2.6%
 
Other values (30) 153 18.3%
 
(Missing) 103 12.3%
 
ValueCountFrequency (%) 
0 255 30.5%
 
0.08 25 3.0%
 
0.16 16 1.9%
 
0.17 1 0.1%
 
0.25 40 4.8%
 
ValueCountFrequency (%) 
30 1 0.1%
 
22 1 0.1%
 
20 4 0.5%
 
19 2 0.2%
 
17 1 0.1%
 

IUD
Boolean

MISSING
Distinct count2
Unique (%)0.3%
Missing112
Missing (%)13.4%
Memory size6.6 KiB
0
640
1
 
83
(Missing)
 
112
ValueCountFrequency (%) 
0 640 76.6%
 
1 83 9.9%
 
(Missing) 112 13.4%
 

IUD (years)
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count26
Unique (%)3.6%
Missing112
Missing (%)13.4%
Infinite0
Infinite (%)0.0%
Mean0.5276210235131397
Minimum0.0
Maximum19.0
Zeros640
Zeros (%)76.6%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile4.9
Maximum19
Range19
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.965438815
Coefficient of variation (CV)3.725095717
Kurtosis29.17991149
Mean0.5276210235
Median Absolute Deviation (MAD)0.9403763327
Skewness4.934714026
Sum381.47
Variance3.862949734
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 640 76.6%
 
3 11 1.3%
 
2 10 1.2%
 
5 9 1.1%
 
1 8 1.0%
 
7 7 0.8%
 
8 7 0.8%
 
6 5 0.6%
 
4 5 0.6%
 
11 3 0.4%
 
Other values (16) 18 2.2%
 
(Missing) 112 13.4%
 
ValueCountFrequency (%) 
0 640 76.6%
 
0.08 2 0.2%
 
0.16 1 0.1%
 
0.17 1 0.1%
 
0.25 1 0.1%
 
ValueCountFrequency (%) 
19 1 0.1%
 
17 1 0.1%
 
15 1 0.1%
 
12 1 0.1%
 
11 3 0.4%
 

STDs
Boolean

HIGH CORRELATION
MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
656
1
 
79
(Missing)
 
100
ValueCountFrequency (%) 
0 656 78.6%
 
1 79 9.5%
 
(Missing) 100 12.0%
 

STDs (number)
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS
Distinct count5
Unique (%)0.7%
Missing100
Missing (%)12.0%
Infinite0
Infinite (%)0.0%
Mean0.18095238095238095
Minimum0.0
Maximum4.0
Zeros656
Zeros (%)78.6%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5681526704
Coefficient of variation (CV)3.139791073
Kurtosis11.17103969
Mean0.180952381
Median Absolute Deviation (MAD)0.3230061548
Skewness3.352184749
Sum133
Variance0.3227974569
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 656 78.6%
 
2 37 4.4%
 
1 34 4.1%
 
3 7 0.8%
 
4 1 0.1%
 
(Missing) 100 12.0%
 
ValueCountFrequency (%) 
0 656 78.6%
 
1 34 4.1%
 
2 37 4.4%
 
3 7 0.8%
 
4 1 0.1%
 
ValueCountFrequency (%) 
4 1 0.1%
 
3 7 0.8%
 
2 37 4.4%
 
1 34 4.1%
 
0 656 78.6%
 

STDs:condylomatosis
Boolean

HIGH CORRELATION
MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
691
1
 
44
(Missing)
 
100
ValueCountFrequency (%) 
0 691 82.8%
 
1 44 5.3%
 
(Missing) 100 12.0%
 
Distinct count1
Unique (%)0.1%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
735
(Missing)
 
100
ValueCountFrequency (%) 
0 735 88.0%
 
(Missing) 100 12.0%
 
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
731
1
 
4
(Missing)
 
100
ValueCountFrequency (%) 
0 731 87.5%
 
1 4 0.5%
 
(Missing) 100 12.0%
 

STDs:vulvo-perineal condylomatosis
Boolean

HIGH CORRELATION
MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
692
1
 
43
(Missing)
 
100
ValueCountFrequency (%) 
0 692 82.9%
 
1 43 5.1%
 
(Missing) 100 12.0%
 

STDs:syphilis
Boolean

MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
717
1
 
18
(Missing)
 
100
ValueCountFrequency (%) 
0 717 85.9%
 
1 18 2.2%
 
(Missing) 100 12.0%
 
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
734
1
 
1
(Missing)
 
100
ValueCountFrequency (%) 
0 734 87.9%
 
1 1 0.1%
 
(Missing) 100 12.0%
 

STDs:genital herpes
Boolean

MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
734
1
 
1
(Missing)
 
100
ValueCountFrequency (%) 
0 734 87.9%
 
1 1 0.1%
 
(Missing) 100 12.0%
 
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
734
1
 
1
(Missing)
 
100
ValueCountFrequency (%) 
0 734 87.9%
 
1 1 0.1%
 
(Missing) 100 12.0%
 

STDs:AIDS
Boolean

MISSING
Distinct count1
Unique (%)0.1%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
735
(Missing)
 
100
ValueCountFrequency (%) 
0 735 88.0%
 
(Missing) 100 12.0%
 

STDs:HIV
Boolean

MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
717
1
 
18
(Missing)
 
100
ValueCountFrequency (%) 
0 717 85.9%
 
1 18 2.2%
 
(Missing) 100 12.0%
 

STDs:Hepatitis B
Boolean

MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
734
1
 
1
(Missing)
 
100
ValueCountFrequency (%) 
0 734 87.9%
 
1 1 0.1%
 
(Missing) 100 12.0%
 

STDs:HPV
Boolean

MISSING
Distinct count2
Unique (%)0.3%
Missing100
Missing (%)12.0%
Memory size6.6 KiB
0
733
1
 
2
(Missing)
 
100
ValueCountFrequency (%) 
0 733 87.8%
 
1 2 0.2%
 
(Missing) 100 12.0%
 

STDs: Number of diagnosis
Categorical

HIGH CORRELATION
Distinct count4
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
764
1
 
68
2
 
2
3
 
1
ValueCountFrequency (%) 
0 764 91.5%
 
1 68 8.1%
 
2 2 0.2%
 
3 1 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

STDs: Time since first diagnosis
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count18
Unique (%)25.4%
Missing764
Missing (%)91.5%
Infinite0
Infinite (%)0.0%
Mean6.140845070422535
Minimum1.0
Maximum22.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q38
95-th percentile19
Maximum22
Range21
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.89502399
Coefficient of variation (CV)0.959969503
Kurtosis0.6822786595
Mean6.14084507
Median Absolute Deviation (MAD)4.60900615
Skewness1.326179119
Sum436
Variance34.75130785
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 15 1.8%
 
3 10 1.2%
 
2 9 1.1%
 
4 6 0.7%
 
7 5 0.6%
 
5 4 0.5%
 
16 4 0.5%
 
6 3 0.4%
 
8 3 0.4%
 
19 2 0.2%
 
Other values (8) 10 1.2%
 
(Missing) 764 91.5%
 
ValueCountFrequency (%) 
1 15 1.8%
 
2 9 1.1%
 
3 10 1.2%
 
4 6 0.7%
 
5 4 0.5%
 
ValueCountFrequency (%) 
22 1 0.1%
 
21 2 0.2%
 
19 2 0.2%
 
18 1 0.1%
 
16 4 0.5%
 

STDs: Time since last diagnosis
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count18
Unique (%)25.4%
Missing764
Missing (%)91.5%
Infinite0
Infinite (%)0.0%
Mean5.816901408450704
Minimum1.0
Maximum22.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q37.5
95-th percentile18.5
Maximum22
Range21
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation5.755270526
Coefficient of variation (CV)0.9894048605
Kurtosis1.016953296
Mean5.816901408
Median Absolute Deviation (MAD)4.472128546
Skewness1.411204173
Sum413
Variance33.12313883
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 17 2.0%
 
2 10 1.2%
 
3 9 1.1%
 
4 6 0.7%
 
7 5 0.6%
 
16 4 0.5%
 
5 3 0.4%
 
6 3 0.4%
 
8 3 0.4%
 
21 2 0.2%
 
Other values (8) 9 1.1%
 
(Missing) 764 91.5%
 
ValueCountFrequency (%) 
1 17 2.0%
 
2 10 1.2%
 
3 9 1.1%
 
4 6 0.7%
 
5 3 0.4%
 
ValueCountFrequency (%) 
22 1 0.1%
 
21 2 0.2%
 
19 1 0.1%
 
18 1 0.1%
 
16 4 0.5%
 

Dx:Cancer
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
817
1
 
18
ValueCountFrequency (%) 
0 817 97.8%
 
1 18 2.2%
 

Dx:CIN
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
826
1
 
9
ValueCountFrequency (%) 
0 826 98.9%
 
1 9 1.1%
 

Dx:HPV
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
817
1
 
18
ValueCountFrequency (%) 
0 817 97.8%
 
1 18 2.2%
 

Dx
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
811
1
 
24
ValueCountFrequency (%) 
0 811 97.1%
 
1 24 2.9%
 

Hinselmann
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
800
1
 
35
ValueCountFrequency (%) 
0 800 95.8%
 
1 35 4.2%
 

Schiller
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
762
1
 
73
ValueCountFrequency (%) 
0 762 91.3%
 
1 73 8.7%
 

Citology
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
792
1
 
43
ValueCountFrequency (%) 
0 792 94.9%
 
1 43 5.1%
 

Biopsy
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.6 KiB
0
781
1
 
54
ValueCountFrequency (%) 
0 781 93.5%
 
1 54 6.5%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

AgeNumber of sexual partnersFirst sexual intercourseNum of pregnanciesSmokesSmokes (years)Smokes (packs/year)Hormonal ContraceptivesHormonal Contraceptives (years)IUDIUD (years)STDsSTDs (number)STDs:condylomatosisSTDs:cervical condylomatosisSTDs:vaginal condylomatosisSTDs:vulvo-perineal condylomatosisSTDs:syphilisSTDs:pelvic inflammatory diseaseSTDs:genital herpesSTDs:molluscum contagiosumSTDs:AIDSSTDs:HIVSTDs:Hepatitis BSTDs:HPVSTDs: Number of diagnosisSTDs: Time since first diagnosisSTDs: Time since last diagnosisDx:CancerDx:CINDx:HPVDxHinselmannSchillerCitologyBiopsy
0184.015.01.00.00.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
1151.014.01.00.00.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
2341.0NaN1.00.00.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
3525.016.04.01.037.00000037.01.03.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN10100000
4463.021.04.00.00.0000000.01.015.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
5423.023.02.00.00.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
6513.017.06.01.034.0000003.40.00.01.07.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00001101
7261.026.03.00.00.0000000.01.02.01.07.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
8451.020.05.00.00.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN10110000
9443.015.0NaN1.01.2669732.80.00.0NaNNaN0.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000

Last rows

AgeNumber of sexual partnersFirst sexual intercourseNum of pregnanciesSmokesSmokes (years)Smokes (packs/year)Hormonal ContraceptivesHormonal Contraceptives (years)IUDIUD (years)STDsSTDs (number)STDs:condylomatosisSTDs:cervical condylomatosisSTDs:vaginal condylomatosisSTDs:vulvo-perineal condylomatosisSTDs:syphilisSTDs:pelvic inflammatory diseaseSTDs:genital herpesSTDs:molluscum contagiosumSTDs:AIDSSTDs:HIVSTDs:Hepatitis BSTDs:HPVSTDs: Number of diagnosisSTDs: Time since first diagnosisSTDs: Time since last diagnosisDx:CancerDx:CINDx:HPVDxHinselmannSchillerCitologyBiopsy
825313.018.01.00.00.00.001.00.500.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
826323.018.01.01.011.00.161.06.000.00.01.01.00.00.00.00.00.00.00.00.00.00.00.01.00NaNNaN10100000
827191.014.00.00.00.00.000.00.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
828232.015.02.00.00.00.000.00.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
829433.017.03.00.00.00.001.05.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
830343.018.00.00.00.00.000.00.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
831322.019.01.00.00.00.001.08.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
832252.017.00.00.00.00.001.00.080.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000010
833332.024.02.00.00.00.001.00.080.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000
834292.020.01.00.00.00.001.00.500.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00NaNNaN00000000